AITopics | non-degeneracy condition

86b3e165b8154656a71ffe8a327ded7d-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 05:46:34 GMT

arxiv preprint arxiv, downstream task, probability, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Neural Information Processing SystemsDec-24-2025, 10:12:19 GMT

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

downstream task, name change, pretrained language model help, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

Add feedback

On Mixtures of Markov Chains

Neural Information Processing SystemsNov-21-2025, 07:52:54 GMT

Our algorithm is spectral in nature, and is easy to implement.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

86b3e165b8154656a71ffe8a327ded7d-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 16:17:10 GMT

arxiv preprint arxiv, downstream task, probability, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Neural Information Processing SystemsJan-13-2025, 14:10:52 GMT

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

downstream task, head and prompt tuning, pretrained language model help, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Beyond Non-Degeneracy: Revisiting Certainty Equivalent Heuristic for Online Linear Programming

Chen, Yilun, Wang, Wenjia

arXiv.org Artificial IntelligenceJan-3-2025

The Certainty Equivalent heuristic (CE) is a widely-used algorithm for various dynamic resource allocation problems in OR and OM. Despite its popularity, existing theoretical guarantees of CE are limited to settings satisfying restrictive fluid regularity conditions, particularly, the non-degeneracy conditions, under the widely held belief that the violation of such conditions leads to performance deterioration and necessitates algorithmic innovation beyond CE. In this work, we conduct a refined performance analysis of CE within the general framework of online linear programming. We show that CE achieves uniformly near-optimal regret (up to a polylogarithmic factor in $T$) under only mild assumptions on the underlying distribution, without relying on any fluid regularity conditions. Our result implies that, contrary to prior belief, CE effectively beats the curse of degeneracy for a wide range of problem instances with continuous conditional reward distributions, highlighting the distinction of the problem's structure between discrete and non-discrete settings. Our explicit regret bound interpolates between the mild $(\log T)^2$ regime and the worst-case $\sqrt{T}$ regime with a parameter $\beta$ quantifying the minimal rate of probability accumulation of the conditional reward distributions, generalizing prior findings in the multisecretary setting. To achieve these results, we develop novel algorithmic analytical techniques. Drawing tools from the empirical processes theory, we establish strong concentration analysis of the solutions to random linear programs, leading to improved regret analysis under significantly relaxed assumptions. These techniques may find potential applications in broader online decision-making contexts.

artificial intelligence, assumption 2, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2501.01716

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Reviews: Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

Neural Information Processing SystemsOct-8-2024, 03:57:36 GMT

This paper is concerned with learning Markov random fields (MRF). It is a theoretical paper, which is ultimately focused with proving a particular statement: given a variable X in an MRF, and given some of its Markov blanket variables B, there exists another variable Y that is conditionally dependent on X given the subset of B. In general this statement is not true; so the goal here is to identify some conditions where this is true. Most of this paper is centered around this, from which the ability to learn an MRF follows. The paper is mostly technical; my main complaint is that I do not think it is very intuitive. It appears that central to the results is the assumption on non-degeneracy, which I believe should be explained in higher level terms.

algorithmic application, information theoretic property, non-degeneracy condition, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

On Mixtures of Markov Chains

Neural Information Processing SystemsMar-12-2024, 14:14:43 GMT

We study the problem of reconstructing a mixture of Markov chains from the trajectories generated by random walks through the state space. Under mild nondegeneracy conditions, we show that we can uniquely reconstruct the underlying chains by only considering trajectories of length three, which represent triples of states. Our algorithm is spectral in nature, and is easy to implement.

algorithm, markov chain, matrix, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On Mixtures of Markov Chains

Gupta, Rishi, Kumar, Ravi, Vassilvitskii, Sergei

Neural Information Processing SystemsDec-31-2016

We study the problem of reconstructing a mixture of Markov chains from the trajectories generated by random walks through the state space. Under mild non-degeneracy conditions, we show that we can uniquely reconstruct the underlying chains by only considering trajectories of length three, which represent triples of states. Our algorithm is spectral in nature, and is easy to implement.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Collaborating Authors

non-degeneracy condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

86b3e165b8154656a71ffe8a327ded7d-Paper.pdf

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

On Mixtures of Markov Chains

86b3e165b8154656a71ffe8a327ded7d-Paper.pdf

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Beyond Non-Degeneracy: Revisiting Certainty Equivalent Heuristic for Online Linear Programming

Reviews: Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

On Mixtures of Markov Chains

On Mixtures of Markov Chains